Semantic Suffix Net Clustering for Search Results

نویسندگان

  • Jongkol Janruang
  • Sumanta Guha
چکیده

Suffix Tree Clustering (STC) uses the suffix tree structure to find a set of snippets that share a common phrase and uses this information to propose clusters. As a result, STC is a fast incremental algorithm for automatic clustering and labeling but it cannot cluster semantically similar snippets. However, the meaning of the words is indeed an important property that relates them to other words, although there may not be a match of text strings per se. In this paper, we propose a new semantic search results clustering algorithm, called semantic suffix net clustering (SSNC). It is based on semantic suffix net structure (SSN). The proposed algorithm uses the net pruning technique to merge the related suffixes through their suffix links for finding base clusters. This logic causes both string matching and meaning of the words to be used as conditions for the purpose of clustering. Experimental results show that the proposed algorithm has time complexity lower than CFWMS, SSTC and STC+GSSN which are current semantic search results clustering methods. Moreover, the F-measure of the proposed algorithm is similar to that of the original STC, CFWMS, STC+GSSN, and higher than that of MSRC and SSTC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Web Search Results Using Semantic Clustering

This paper consider the problem of search engine that are not capable of retrieving appropriate result on query given. Most of the users are not able to give the appropriate query to get what exactly they wanted to retrieve. So the search engine retrieves a massive list of data, which are ranked by the page rank algorithm or relevancy algorithm or human judgment algorithm. If the relevant resul...

متن کامل

Semantic Suffix Tree Clustering

This paper proposes a new algorithm, called Semantic Suffix Tree Clustering (SSTC), to cluster web search results containing semantic similarities. The distinctive methodology of the SSTC algorithm is that it simultaneously constructs the semantic suffix tree through an on-depth and on-breadth pass by using semantic similarity and string matching. The semantic similarity is derived from the Wor...

متن کامل

Clustering of Web Search Results Using Semantic

Clustering is related to data mining for information retrieval. Relevant information is retrieved quickly while doing the clustering of documents. It organizes the documents into groups; each group contains the documents of similar type content. Different clustering algorithms are used for clustering the documents such as partitioned clustering (K-means Clustering) and Hierarchical Clustering (...

متن کامل

A semantics-based method for clustering of Chinese web search results

Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable informa...

متن کامل

Semantic, Hierarchical, Online Clustering of Web Search Results

Today, search engine is the most commonly used tool for Web information retrieval, however, its current status is still far from satisfaction. This paper focuses on clustering Web search results in order to help users find relevant Web information more easily and quickly. The main contributions of this paper include the following. (1) The benefits of using key phrases as natural language inform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012